Comparative wavelet and MFCC speech recognition experiments on the Slovenian and English speechdat2
نویسندگان
چکیده
The main motivation for this project was to study performance of non-linear speech analysis methods in automatic speech recognition. Specifically, we selected wavelet transform as a promising non-linear tool for signal analysis that has been already successfully applied in many tasks, such as in image recognition and compression leading to standards such as JPEG2000. The plan was to perform a comparative analysis between the standard mel–cepstral and wavelet based set of features and to evaluate the baseline speech recognition rates of two aforementioned parameterization methods. We start with a brief description of the Fourier and wavelet transforms from the perspective of joint time– frequency analysis where we focus on localization issues of the two transforms. Ability of the transformation to properly capture short time events is defined with the localization capabilities of its basic functions and is one of the prerequisites for a successful application in speech processing. The Fourier transform offers constant time–frequency resolution where the wavelet transform enables better frequency resolution at low frequencies and better time localization of the transient phenomena in the time domain [1]. This very much resembles to the first stage of human auditory perception [2] and to basilar membrane excitation [3] where the wavelet transform introduces roughly logarithmic frequency sensitivity. We carried out comparative within and cross-language experiments on the Slovenian and English SpeechDat2 [4] databases using the standard mel–cepstral and the wavelet based set of features. The tool used in automatic speech recognition was the reference recogniser [5,6] that is built around the HTK toolkit. This enabled us to conduct controlled experiments on six different subsets of SpeechDat2 vocabularies (yes/no sentences, citinames, phonetically rich word, digits, etc). 2. Wavelet Packet Parametrization
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملSPE based selection of context dependent units for speech recognition
Decision tree-based approach is a well known and frequently used method for tying states of the context dependent phone models since it is able to provide good models for contexts not encountered in the training data. In contrast to the other approaches, this method allows us to include expert linguistic knowledge into the system. Our research focused on the inclusion of standard generative the...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملMel Frequency Discrete Wavelet Coefficients for Kannada Speech Recognition using PCA
In this paper, a new scheme for recognition of isolated words in kannada Language speech, based on the Discrete Wavelet Transform(DWT) and PCA has been proposed. First, the DWT of the speech is computed and then MFCC coefficients are calculated. For this, Principal Component Analysis procedure is applied for speech recognition. This paper also presents the comparative results with respect to th...
متن کاملNew Filter Structure based on Admissible Wavelet Packet Transform for Text-Independent Speaker Identification
Identical acoustic features like Mel frequency cepstral Coefficients (MFCC)and Linear predictive cepstral coefficients (LPCC) are being widely used for different tasks like speech recognition and speaker recognition, whereas the requirement of speaker recognition is different than that of speech recognition. In MFCC feature representation, the Mel frequency scale is used to get a high resolutio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003